EDA Final Report: Physiological Linkages & Empathic Accuracy

Final Project
Data Science 1 with R (STAT 301-1)

Author

Angela Zhong

Published

December 12, 2024

Github Repo Link

1 Introduction

Empathy, the capacity to resonate with others’ emotions and offer compassion is fundamental in fostering healthy familial relationship and supportive interactions between parents and children (Yoo et al., 2013). Understanding empathy in these interactions is crucial for promoting better family dynamics and improving child well-being. Parents’ empathic capacity is positively associated with children’s attachment security, openness, and perception of parental warmth (Stern et al., 2014).

Empathy is a multifaceted construct comprising cognitive, affective, and physiological dimensions (Yoo et al., 2013). Empathic accuracy, or the ability to accurately understand another’s emotions, is often studied as a cognitive skill. However, research in psychophysiology suggests an alternative route of empathy—through physiological processes. People empathize by mirroring others’ autonomic nervous system responses in their own bodies, a phenomenon referred to as physiological linkage (Preston & de Waal, 2002).

In this EDA project, I mainly examined whether greater physiological linkage between caregivers and youth predicts higher empathic accuracy in caregivers. Of particular interest were caregivers of youth at clinical high risk (CHR) for psychosis, as empathy in these relationships could play a significant role in youth functioning and clinical outcomes. By studying the connection between physiological synchrony and empathic accuracy in caregiver-youth dyads, I hope to eventually contribute to a better understanding of the mechanisms underlying empathy and its potential implications for interventions aimed at improving parent-child relationships, especially in CHR populations.

2 Data Source

The data for this project is collected as part of a larger research project named Emotions in Adolescence (EIA; Fattal et al., 2024). Conducted by the Lifespan Development Lab here at Northwestern SESP, the EIA study focuses on real-life dyadic interaction tasks involving youth at clinical high risk (CHR) for psychosis and their caregivers. Explicit permission to use the data has been obtained, but unfortunately, due to IRB requirements and protections of participant’s privacy, the raw data could not be disclosed.

2.1 Research Methodology

In EIA, youth participants and one of their caregivers engaged in face-to-face videotaped conversations on a topic of disagreement, while their autonomic physiology (including electrocardiogram to assess Interbeat Interval) was monitored continuously.

After the conversations, adolescents and parents separately participated in a video recall session where they watched the videotaped conflict conversation twice to provide continuous ratings of first (a) their own feelings and second (b) how they thought their interaction partner was feeling during the conversation using a Rating Dial, during which their autonomic physiology was also monitored continuously.

Participants were then asked to complete a self-report questionnaire that ask them to report their own emotions and their perceptions of their conversation partners’ emotions during the conversation.

See Figure 1 for a visual illustration of the experimental methodology.

Figure 1: method

Empathic Accuracy is calculated by the second-by-second cross-correlations between caregivers’ rating of youth’s emotional experiences and youth’s self-rated emotional experiences from Rating Dial Data (RDD) during the conflict conversation on a second-by-second basis such that higher scores indicate greater accuracy.

Physiological linkage is calculated by the second-by-second cross-correlations between caregiver’s interbeat interval (IBI; i.e., time interval between two consecutive R peaks) when watching the conversation video and youth’s IBI during the conflict conversation, differentiating between in-phase and anti-phase linkage.

2.2 Data overview & quality

Multiple datasets were involved in this project:

  • IBI Caregiver-Youth Compiled.CSV: contains the second-by-second raw IBI data for caregivers during the empathy trial (rewatching the conversation video) and youth during the actual conflict conversation

  • RDD Caregiver-Youth Compiled.CSV: contains the second-by-second raw RDD data for caregivers’ ratings of the youth’s emotional experiences and youth’s self-rated emotional experiences

  • EIA SQ.CSV: contains data of both dyads’ self-reported emotions from the short questionnaire filled by dyads after the conversation

  • IBI Linkage + RDD Linkage Compiled.CSV: contains physiological linkage and empathic accuracy data derived from raw IBI and raw RDD data.

2.2.1 Raw IBI data

Data was cleaned (by myself) and compiled so that only dyads with both caregiver’s and youth’s IBI data available were kept. Dyads with one side/both side missing or unusable IBI bata were discarded.

  • 56,400 observations in total.
  • Variables:
    • Task (character): Indicates the task type (e.g., “conflict conversation”).
    • Second (numeric): Time variable specifying the second in the task.
    • Dyad_ID (numeric): Unique identifier for each dyad.
    • Individual_ID (numeric): Unique Identifier for individuals within dyads.
    • IBI (numeric): Inter-beat interval values in seconds.

Quality Issues: Potential challenges include outlier/extreme IBI values, which may affect the analysis of physiological linkage or variability across individuals and dyads.

2.2.2 Raw RDD data

Data was cleaned (by myself) and compiled so that only dyads with both caregiver’s and youth’s RDD data available were kept. Dyads with one side/both side missing or unusable RDD bata were discarded.

  • 61,200 observations in total.
  • Variables:
    • Task (character): Indicates the task type (e.g., “conflict conversation”).
    • Second (numeric): Time variable specifying the second in the task.
    • Dyad_ID (numeric): Unique identifier for each dyad.
    • Individual_ID (numeric): Unique Identifier for individuals within dyads.
    • RDD (numeric): Rating Dial Scores in seconds.

Quality Issues: Potential challenges include outlier/extreme RDD values, which may affect the analysis of physiological linkage or variability across individuals and dyads. The naming (e.g., case differences in variable names) and organizational inconsistencies across both files could also create complications during data merging or analysis.

2.2.3 Self-reported emotions data

Data was cleaned (by other lab members) and compiled so that dyads with inconsistent, mismatched, or largely missing data were discarded. However, dyads with some instances of missing data were still kept.

  • 457 observations in total.
  • Variables
    • demographic information (ID, dyad_id, CHR status, task, etc.; numeric, factor, and chracter)
    • response timing (StartDate, EndDate)
    • emotional checklist (e.g., amusement, anger, calm, disgust; numeric): 10-point likert scale measure of emotions felt during the conversation
    • strongest_self (character): self-report of strongest emotion felt by self during the conversation
    • strongest_partner (chracter): report of perceived strongest emotion felt by conversation partner during the conversation
    • other unused variables

Quality Issues:Potential missing data in emotion ratings (e.g., NA values in compassion). Data cleanliness issues with redundant or irrelevant variables (e.g., IPAddress, survey metadata). Variability in subjective report of strongest emotions (due to the open-ended nature of the question) increases difficulty of extracting useful information from noise.

2.2.4 IBI + RDD Linkage data

IBI and RDD linkage data were computed on a second-by-second basis using overall/ in-phase anti-phase/ absolute linkage analysis tools respectively.

  • 20,400 observations in total.
  • Variables:
    • Identifiers:
      • Dyad_ID (numeric): Unique identifier for dyads.
      • Child_ID, Caregiver_ID (numeric): Unique identifiers for children and caregivers.
      • Second (numeric): Timestamps in seconds.
    • Linkage Metrics (IBI and RDD) (numeric): Pearson correlations capturing different linkage aspects and corresponding p-values for each metric
      • overall_linkage_pearson_correlation (numeric)
      • absolute_linkage_pearson_correlation (numeric)
      • inphase_linkage_pearson_correlation (numeric)
      • antiphase_linkage_pearson_correlation (numeric)
    • Clinical Condition:
      • CHR (factor): Binary indicator (e.g., 1 for CHR condition).

Quality Issues: Due to the nature of 15-second windows used in calculating linkage metrics, there is no data for the first 7 seconds of each conversation, which needs proper handling in the analysis. Correlation metrics are segmented by overall, absolute, in-phase, and anti-phase components, reflecting nuanced physiological synchronization.

3 Explorations

3.1 Univariate Analyses

3.1.1 CHR status

The sample in this study consists of 48 dyads in the Clinical High Risk (CHR) group and 54 dyads in the Healthy Control (HC) group, totaling up to 28800 and 32400 observations respectively.

The CHR group includes youth who are at an elevated risk for developing psychotic disorders due to experiencing attenuated or subthreshold psychosis symptoms. These youth were recruited based on the presence of these symptoms, and their caregivers were also included in the study.

In contrast, the HC group consists of youth who do not meet the criteria for CHR, alongside their caregivers, serving as a comparison group.

This distinction between the CHR and HC groups is crucial for exploring potential group differences in physiological and emotional responses, particularly in understanding how these factors may relate to empathic accuracy and physiological linkage.

Refer to the following Table 1 and Figure 2 for the sample size for each condition (CHR vs. control). if

Table 1: Summary of CHR Status
CHR Count
CHR 48
Control 54
Figure 2

3.1.2 Raw IBI and RDD

I calculated the descriptive statistics (mean, SD, median, range) for both IBI and RDD across Caregivers and Youth (See Table 2). These statistics offer insights into the central tendency and variability of these physiological measures and empathic accuracy measures. This step provides a foundation for understanding how these variables behave within the sample before further analyses.

Table 2: Descriptive Statistics for IBI and RDD
Measure Min Max Mean SD Median
IBI 304.799356 2945.000000 862.082668 145.7976396 854.427921
RDD 0.020361 7.924431 2.463937 0.9930137 2.412883

The density plots below visualize the distribution of IBI and RDD for both the Youth and Caregiver groups.

Figure 3

Figure 3 shows the distribution of Interbeat Intervals (IBI) for both caregivers and youth, with caregivers’ distribution slightly broader and peaking at a higher IBI value compared to youth.

Figure 4

Figure 4 illustrates the distribution of Respiratory Sinus Arrhythmia-Derived Data (RDD) for caregivers and youth, where both groups exhibit similar patterns, but caregivers have a slightly higher peak at lower RDD values.

While both graphs reveal similar overall patterns between caregivers and youth, the IBI plot shows a more pronounced difference in peak and range, whereas the RDD distributions are closely aligned, suggesting greater similarity in respiratory dynamics than in heart rate variability.

3.1.3 IBI linkage and RDD linkage

I also calculated the descriptive statistics (mean, SD, median, range) for IBI and RDD linkage across the overall, in-phase, and anti-phase conditions. The results below (Table 3) showed that IBI overall linkage had a mean of 0.033, with in-phase linkage at 0.178 and anti-phase linkage at 0.145, indicating low synchrony overall, with the in-phase condition showing slightly higher synchrony. For RDD linkage, the overall mean was 0.063, with in-phase linkage at 0.256 and anti-phase linkage at 0.193, similarly reflecting modest synchrony and a higher in-phase value.

Table 3: Descriptive Statistics for IBI and RDD Linkage Types (Overall, In-phase, and Anti-phase)
Measure Min Max Mean SD Median
IBI Overall Linkage -0.9515327 0.9977606 0.0328899 0.3881127 0.0386137
IBI In-phase Linkage 0.0000000 0.9977606 0.1777241 0.2321558 0.0386137
IBI Anti-phase Linkage 0.0000000 0.9515327 0.1448343 0.2127238 0.0000000
RDD Overall Linkage -0.9999995 1.0000000 0.0631561 0.5225202 0.0792397
RDD In-phase Linkage 0.0000000 1.0000000 0.2564916 0.3129383 0.0792397
RDD Anti-phase Linkage 0.0000000 0.9999995 0.1933355 0.2755251 0.0000000

The plots below (Figure 5 and Figure 6) illustrates the overall distribution of IBI and RDD linkage across caregivers and youth. While the distribution for both measures is relatively spread out, the in-phase condition exhibits the most concentrated distribution, suggesting that caregivers and youth demonstrate higher synchrony during in-phase interactions.

Figure 5
Figure 6

Physiological Linkage (IBI Linkage) was broken down into in-phase (see Figure 7) and anti-phase linkage (see Figure 8).

Tip

(In-phase linkage): synchronous changes in heart rate between two dyads (i.e., their IBI increase or decrease in the same direction); indicates stronger emotional connection or mutual understanding between individuals

(Anti-phase linkage): oppositional pattern of heart rate fluctuations between two individuals (i.e., one person’s IBI increase while the other’s decrease, vice versa); associated with emotional discord and moments of tension between the individuals

3.1.3.1 IBI In-phase linkage

Figure 7

3.1.3.2 IBI Anti-phase linkage

Figure 8

3.1.4 Self-reported emotions checklist

In Table 4, I also summarized the self-reported emotions measured with 1) numeric ratings of intensity of common emotions 2) self-reported strongest emotion felt during the conversation across all dyads by first converting the emotion variables into factors.

Table 4: Summary of Emotional Checklist by Emotions
Emotion Mean SD Min Max
affection 4.3436123 2.520729 0 8
amusement 4.2511013 2.435651 0 8
anger 0.8153846 1.569598 0 8
calm 4.6989011 2.166906 0 9
compassion 4.1607930 2.496471 0 9
disgust 0.5494505 1.400627 0 8
embarrassment 0.8000000 1.691388 0 8
excitement 3.2175824 2.550572 0 8
fear 0.6351648 1.477763 0 8
gratitude 3.8615385 2.747156 0 8
love 4.5406593 2.653648 0 9
pride 2.2373626 2.512684 0 9
sadness 0.8285714 1.641041 0 8
shame 0.5714286 1.407523 0 8
surprise 1.6989011 2.179070 0 8

To simplify the analysis, I grouped the top 10 most frequently reported strong emotions and combined all other emotions into an “others” category in Figure 9 and Figure 10. This approach helps clarify the data and focus on the most common emotional experiences. In both self-reports and partner ratings, “calm” and “affection” were the most frequently reported emotions, which is especially interesting given the context of a conflict conversation. Exploring this further could reveal nuances in how individuals perceive and regulate their emotions during interactions, even in potentially tense situations.

Figure 9
Figure 10

3.2 Bivariate Analysis

3.2.0.1 Physiological Linkage (In-phase vs. Anti-phase) & CHR status

I compared the physiological linkage between the CHR (Clinical High Risk) and HC (Healthy Control) groups for both in-phase and anti-phase conditions. The side-by-side boxplots in Figure 11 show differences in the spread and central tendencies of linkage values between the groups, with in-phase physiological linkage being more variable and higher than anti-phase linkage in both conditions. Overall, the HC group exhibited higher physiological linkage in both in-phase and anti-phase conditions compared to the CHR group, though no significant difference was found between the medians of the two groups.

Figure 11

3.2.0.2 Empathy Accuracy & CHR status

I also compared Empathic Accuracy (measured through Rating Dial Data, RDD) between the CHR (Clinical High Risk) and HC (Healthy Control) groups (see Figure 12). Surprisingly, the CHR group exhibited a slightly higher median empathic accuracy compared to the HC group and showed a more centered distribution. These findings could provide intriguing insights that warrant further exploration in the multivariate analysis.

Figure 12

3.2.1 Strongest Emotion & CHR status

I examined whether the self-reported strongest emotion during the conversation is associated with the CHR status (Clinical High Risk vs. Healthy Control) using a chi-square test, which tests whether there is a significant association between the CHR status and the self-reported strongest emotion felt during the conversation.

The chi-square test in Table 5 revealed a significant association between CHR status and the self-reported strongest emotion felt during the conversation, with a p-value less than 2.2e-16, indicating that the distribution of strongest emotions differs significantly between individuals at clinical high risk for psychosis and healthy controls.

Table 5

3.3 Multivariate Analyses

3.3.1 Visualizations

The visualizations (See ?@fig-inphase-linkage-second) presented here illustrate the relationship between empathic accuracy and physiological linkage in both in-phase and anti-phase conditions. The first scatterplot in each section shows second-by-second correlations between physiological linkage and empathic accuracy, with separate lines for the HC (Healthy Control) and CHR (Clinical High Risk) groups to observe potential group differences. The second scatterplot in each section (?@fig-inphase-linkage-avg, ?@fig-antiphase-linkage-avg) presents averaged dyad-level data to provide a clearer picture of the overall relationship, with outliers identified and highlighted for further investigation. These plots are designed to highlight both individual and average patterns of physiological and empathic processes, enabling a more nuanced understanding of how these variables interact across different conditions.

3.3.1.1 Empathic accuracy VS. In-phase physiological linkage

Figure 13
Figure 14

3.3.1.2 Empathic accuracy VS. Anti-phase physiological linkage

Figure 15
Figure 16

3.3.2 Multi-model Linear regression analyses

3.3.2.1 Overall Goal of the Multivariate Analyses

The goal of these analyses was to explore how physiological linkage, as measured by heart rate variability (IBI), predicts empathic accuracy in dyads (either combined or separated by clinical high-risk (CHR) status). I also wanted to examine whether CHR status moderated this relationship, with particular focus on interactions between physiological linkage and CHR status. Given the complexity of this analysis, multiple models were fitted, starting from simple linear regression models and progressing to more complex linear mixed-effects models (LMM) to account for variance at the dyad level.

3.3.2.2 Model Choices and Rationale

I started with simple linear regression models (SLMs) to establish a basic relationship between physiological linkage (IBI) and empathic accuracy. These simple models were then expanded by including CHR status as a covariate to explore whether it had a main effect or interacted with physiological linkage. This interaction term was tested to determine if the relationship between physiological linkage and empathic accuracy varied across the CHR and HC groups.

However, A simple linear regression only accounts for variation at the individual level and assumes that the residuals are independent across all observations, which is not suitable when there are repeated measures from the same group (dyad). To refine the analysis further, I used linear mixed-effects models (LMMs), which can account for both fixed effects (e.g., the influence of condition or group membership like HC vs. CHR) and random effects (e.g., the variability between dyads), giving a more nuanced and accurate analysis. This is important because the relationship between physiological linkage and empathic accuracy might differ not just by CHR status but also by individual dyads.

3.3.2.3 Presenting the Results

The results of these models are summarized in two tables (one for in-phase, one for anti-phase) that categorizes the significance of each model based on p-values (p < 0.05 or p < 0.01). Additionally, the inclusion of beta coefficients provides insight into the direction and strength of the relationships, while p-values offer a measure of statistical significance for each model.

By presenting the results in this summarized and organized way, I emphasized the key outcomes only to avoid overwhelming the reader with excessive statistical details, focusing instead on the most relevant findings. This allows readers to quickly grasp the core insights from the analyses without unnecessary distractions.

3.3.2.4 In-phase Regression Results

As shown in Table 6, in the combined HC & CHR group, the simple linear model (SLM) reveals a significant negative relationship between physiological linkage and empathic accuracy (β = -0.0397, p = 0.014). However, the linear mixed model (LMM) did not show a significant effect (β = -0.0159, p = 0.324). When considering CHR as a covariate, the SLM model also showed a significant negative relationship (β = -0.0370, p = 0.022). For the interaction model with CHR as a factor, both the SLM and LMM did not reach significance for the interaction term (SLM: β = -0.0392, p = 0.055; LMM: β = -0.0237, p = 0.249).

For the HC-only group, neither the SLM nor the LMM showed significant relationships between physiological linkage and empathic accuracy (SLM: β = -0.0392, p = 0.060; LMM: β = -0.0240, p = 0.254). Similarly, in the CHR-only group, no significant effects were found (SLM: β = -0.0332, p = 0.194; LMM: β = -0.0026, p = 0.918).

Table 6: Empathic Accuracy and In-phase Physiological Linkage
In-phase Linear Regression Results
Model Overview
Results
Model Type Group Interaction beta p-value
SLM HC & CHR Combined None -0.03971185 0.01399561
LMM HC & CHR Combined None -0.01595495 0.3241941
SLM CHR as Covariate None -0.03697828 0.02209476
SLM HC & CHR Combined CHR Interaction -0.03923016 0.05492753
LMM HC & CHR Combined CHR Interaction -0.02370664 0.248506
SLM HC only None -0.03923016 0.05966409
LMM HC only None -0.0240067 0.2538406
SLM CHR only None -0.03322653 0.1944531
LMM CHR only None -0.002591045 0.9183938

3.3.2.5 Anti-phase Regression Results

As shown in Table 7, in the combined HC & CHR group for antiphase models, neither the SLM nor the LMM showed significant relationships between physiological linkage and empathic accuracy (SLM: β = 0.0283, p = 0.108; LMM: β = 0.0312, p = 0.075). The model including CHR as a covariate did not produce significant results either (SLM: β = 0.0300, p = 0.089). The interaction models with CHR did not reveal significant effects (SLM: β = -0.0080, p = 0.721; LMM: β = -0.0243, p = 0.274).

In the HC-only group, no significant effects were found for either the SLM or the LMM (SLM: β = -0.0080, p = 0.726; LMM: β = -0.0240, p = 0.292). In contrast, for the CHR-only group, both the SLM and LMM revealed significant positive relationships between physiological linkage and empathic accuracy (SLM: β = 0.0925, p = 0.0009; LMM: β = 0.1227, p < 0.0001), indicating a robust association in this group.

Table 7: Empathic Accuracy and Anti-phase Physiological Linkage
Antiphase Linear Regression Results
Model Overview
Results
Model Type Group Interaction beta p-value
SLM HC & CHR Combined None 0.02833373 0.1081708
LMM HC & CHR Combined None 0.03115654 0.07521094
SLM CHR as Covariate None 0.02996803 0.08910531
SLM HC & CHR Combined CHR Interaction -0.007990003 0.7205816
LMM HC & CHR Combined CHR Interaction -0.02427965 0.2743254
SLM HC only None -0.007990003 0.7257005
LMM HC only None -0.02396787 0.2923468
SLM CHR only None 0.09253463 0.0008863752
LMM CHR only None 0.1226989 7.316664e-06

4 Conclusion

The analysis of both self-reported emotions and physiological linkage provides valuable insights into the dynamics of empathic accuracy and emotional regulation within dyads, particularly in the context of individuals at Clinical High Risk (CHR) for psychosis and Healthy Controls (HC).

On a univariate level, self-report emotion data revealed that “calm” and “affection” were the most frequently reported emotions by both self-reports and partner ratings, suggesting that even in potentially tense situations, individuals may still experience and express emotions of warmth and composure. This finding highlights the complexity of emotional regulation in conflict situations and warrants further exploration into how individuals perceive and manage emotions during interactions.

Bivariate analyses revealed that physiological linkage, measured through IBI, showed group differences in variability across in-phase and anti-phase conditions, although no significant differences were found in the central tendency of the linkage measures between the groups. Empathy accuracy, assessed through Rating Dial Data (RDD), provided further insight into emotional processes. Surprisingly, the CHR group demonstrated slightly higher empathic accuracy than the HC group, which challenges expectations and suggests the presence of compensatory mechanisms in individuals at clinical risk for psychosis.

Multivariate analyses, including linear regression and linear mixed-effects models (LMMs), explored the relationship between physiological linkage and empathic accuracy. In both in-phase and anti-phase conditions, regression models revealed a significant negative relationship between physiological linkage and empathic accuracy in the combined HC and CHR group. However, when dyad-level variance was accounted for using LMMs, the effects diminished, highlighting the importance of considering dyadic variability in such analyses. Further analysis by group (HC and CHR) showed that neither group demonstrated significant relationships in isolation, suggesting that the link between physiological responses and empathic accuracy may not be as strong or straightforward in either group.

Taken together, these findings suggest a complex and nuanced interaction between physiological responses and empathic accuracy in different groups, with no straightforward pattern emerging between physiological linkage and empathic accuracy across different groups and conditions. Future research should delve deeper into understanding the underlying mechanisms that explain these group differences and explore potential factors that may influence empathic processes in individuals at clinical high risk for psychosis. Additionally, the surprising finding that individuals at clinical high risk for psychosis showed higher empathic accuracy deserves further attention, as it may provide new insights into compensatory mechanisms or emotional regulation strategies that differ from those in healthy controls.

5 References

Fattal, J., Martinez, M., Gupta, T., Stephens, J. E., Haase, C. M., & Mittal, V. A. (2024). Disrupted coherence between autonomic activation and emotional expression in individuals at clinical high risk for psychosis. Journal of Psychopathology and Clinical Science. Kerem, E., Fishman, N., & Josselson, R. (2001). The experience of empathy in everyday relationships: cognitive and affective elements. Journal of Social and Personal Relationships, 18(5), 709–729. https://doi.org/10.1177/0265407501185008

Preston, S. D., & de Waal, F. B. (2002). Empathy: Its ultimate and proximate bases. Behavioral and Brain Sciences, 25, 1-20.

Stern, J., Borelli, J. L., & Smiley, P. A. (2014). Assessing parental empathy: a role for empathy in child attachment. Attachment & Human Development, 17(1), 1–22. https://doi.org/10.1080/14616734.2014.969749

Yoo, H., Feng, X., & Day, R. D. (2013). Adolescents’ empathy and prosocial behavior in the family context: a longitudinal study. Journal of Youth and Adolescence, 42(12), 1858–1872. https://doi.org/10.1007/s10964-012-9900-6